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ABSTRACT 


— -  V  taplacian  Smoothing  Splines  (LSS)  are  presented  as  generalizations  of 
graduation,  cubic  and  thin  plate  splines.  The  method  of  generalized  cross 
validation  (GCV)  to  choose  the  smoothing  parameter  Is  described.  GCV  is  used 
in  the  algorithm  for  the  computation  of  LSS's.  An  outline  of  a  computer 
program  which  implements  this  algorithm  is  presented  along  with  a  description 
of  the  use  of  the  program.  Examples  in  one,  two  and  three  dimensions 
demonstrate  how  to  obtain  estimates  of  function  values  with  confidence 
intervals  and  estimates  of  first  and  second  derivatives.  Probability  plots 
are  used  as  a  diagnostic  tool  to  check  for  model  inadequacy.  r.  . 
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1.  Motivation 

A  Laplacian  smoothing  spline  (LSS)  Is  a  statistical  tool  used  to  model  a 
smooth  but  otherwise  unknown  function.  The  fitted  spline  provides  an  analytic 
function  which  may  be  utilized  to  estimate  derivatives,  integrals  or  values  of 
the  underlying  function.  For  data  analysis  purposes  a  graphical  display  of 
the  fitted  spline  {or  cross  sections  for  multidimensional  problems)  often 
provides  insight  which  might  otherwise  remain  masked  by  the  irregularly 
spaced,  multidimensional  and  "noisy"  data.  The  residuals,  which  are  the 
observed  values  of  the  dependent  variable  minus  the  corresponding  fitted 
spline  values,  may  be  utilized  as  an  aid  in  model  checking.  A  probability 
plot  of  the  residuals  provides  a  vehicle  to  detect  possibly  discrepant 
observations  (outliers).  With  the  above  ideas  as  the  eventual  objective  we 
first  elucidate  the  functional  form  of  the  LSS  and  then  describe  an  algorithm 
for  its  computation. 

When  someone  mentions  a  line,  cosine  or  an  exponential  we  all  have  a 
visual  image  of  "feel"  for  the  function  In  question.  Using  the  following 
example  we  hope  to  provide  an  Intuitive  feeling  for  an  LSS. 

In  one  dimension  Imagine  a  long,  thin,  perfectly  rigid  rod  (a  line)  lying 
on  a  frictionless  plane  with  coordinate  axes  (t,z).  We  represent  this  rod  as 
a  function  of  t,  say  g(t).  Assume  that  we  are  given  N  points  in  the  plane 
{(t,z):(t,z)«(ti  ,zi ),  1*1,...,N}.  The  ti  are  considered  to  be  distinct  and 
known  without  error.  The  z\  are  measurements  of  a  true  but  unknown  function  f 
evaluated  at  tj  plus  some  "noise"  e^.  The  e^  are  independent  random 
variables,  each  having  mean  zero  and  finite  variance. 
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With  the  previous  setup  imagine  that  an  ideal  spring  is  attached  to  data 

point  ( t i ,  z i )  and  to  the  rod  (ti,g(tf))  for  each  1,  1*1 . N.  This  fixes  the 

springs  to  remain  parallel  to  the  ordinate  axis.  What  position  will  the  rod 
g(t)  assume? 

Physics  provides  a  means  to  answer  this  question.  The  rod  will  assume 
the  position  which  minimizes  the  energy  of  the  springs.  The  energy  of  an 
ideal  spring  is  equal  to  some  positive  constant  (called  the  spring 
constant)  times  the  square  of  the  length  it  is  stretched.  Thus  the  cumulative 
energy  of  the  N  springs  Is 

N 

z  Mzi  -  g(t1))2  . 

1*1 

This  is  minimized  when  g  is  the  least  squares  line  (provided  we  restrict  g  to 
be  rigid)  therefore  the  least  squares  line  Is  the  position  the  rod  will  assume 
if  ki  *  k0,  i*l,...,N,  k0  some  constant.  If  the  k^  are  not  all  equal  then  the 
rod  will  assume  the  position  of  the  weighted  least  squares  line.  Notice  that 
this  spring  idea  provides  an  Intuitive  explanation  for  minimizing  the  residual 
sum  of  squares  in  regression. 

The  situation  Is  analogous  In  two  dimensions:  a  thin  plate  of  infinite 
rigidity  (not  bendable)  would  assume  the  position  of  the  least  squares  plane. 
The  situation  In  three  dimensions,  although  not  as  easy  to  visualize,  is 
analogous.  There  are  further  restrictions  on  the  ti  which  are  rigorously 
given  in  (2.6). 

We  have  thus  far  assumed  that  the  rod  Is  rigid.  This  Is  not  necessary 
and  may  not  be  a  good  representation  of  the  physical  phenomenon  under 
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consideration.  So  we  relax  the  rigidity  assumption  and  assume  that  the  rod  Is 
flexible.  If  zero  energy  were  required  to  flex  the  rod  then  the  minimum 
energy  position  which  the  rod  would  assume  Is  that  of  a  function  of 
interpolation.  Since  the  residuals  are  zero,  this  configuration  has. zero 
energy  and  thus  Is  a  minimum.  By  this  explanation  It  Is  readily  seen  that  the 
function  thus  obtained  is  not  unique.  This  anomaly  will  be  alleviated  by 
requiring  energy  to  flex  the  rod. 

Consider  the  more  realistic  case  where  the  rod  is  flexible  and  takes 
energy  to  flex.  The  spring  of  a  diving  board  is  testimony  to  this.  Note  that 
the  bending  energy  of  a  rod  is  (p/o2)J2(g),  where  p/a2  is  a  constant  and 

02(g)  3  /[g(2)(x)]2dx  .  d.i) 

Therefore  the  bending  energy  Is  proportional  to  curvature  which  may  be 
measured  as  J2(g)  In  (1.1). 

t 

To  find  the  position  which  the  rod  will  assume  under  these  conditions  is 
equivalent  to  finding  the  function  g  which  will  minimize  the  total  energy  of 
the  system 

N 

E  Mzi  -  9(ti))Z  +  (p/°2)  J2(9)  (1.2) 

1*1 

or  equivalently  the  mlnlmlzer  of 
N 

(1/M)  2  °2M21  -  9(ti))2  ♦  (P/N)J2(g)  .  (1.3) 

1-1 

The  function  from  a  certain  class  of  functions,  X,  which  minimizes  (1.3) 
can  be  shown  to  be  a  piecewise  cubic  spline.  The  function  space  X  Is 
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rigorously  defined  In  Wahba  and  Wendelberger  (1980).  Here  X  should  be  thought 
of  as  a  space  of  smooth  functions  which  map  Rd  Into  R*.  There  Is  much 
literature  about  cubic  splines  In  one  dimension.  To  this  author's  knowledge 
the  earliest  work  on  LSS's  is  that  of  Schoenberg  (1964);  other  Important  work 
on  splines  Is  given  In  Craven  and  Wahba  (1979),  Duchon  (1976),  Prenter  (1975), 
and  Relnsch  (1967). 

The  one  dimensional  case  generalizes  to  two  dimensions.  In  two 
dimensions  the  splines  are  called  thin  plate  splines  because  of  the  analogy  of 
minimizing  the  energy  of  a  thin  plate  of  infinite  extent.  The  earliest 
suggested  application  of  thin  plate  smoothing  splines  seems  to  have  been  by 
Harder  and  Desmarals  (1972).  They  suggested  that  spring  forces  may  be  applied 
at  the  points  of  interpolation.  This  inspired  the  spring  analogy  given  here. 
This  spring  concept  Is  equivalent  to  LSS's  in  either  one  or  two  dimensions 
(with  m=2  in  (2.1)).  Much  recent  work  on  LSS's  has  been  done  by  Wahba  (see 
Wahba  (1979)  and  the  references  cited  there). 

In  two  dimensions  J2{g)  becomes 

•  *  2  / 2I  32g(xi,X2) 

J2(g)  *  /  /  1  lv/[ - r— ]2  dxi  dx2  •  (1*4) 

--  --  v*0  '  3xi w  3x2Z"v 

J2(g)  1s  proportional  to  the  bending  energy  of  a  thin  plate  (under  simplifying 
assumptions);  for  details  see  Melnguet  (1979).  However,  In  two  dimensions  the 
solution  Is  no  longer  a  piecewise  cubic  but  rather  takes  the  form 
N 

g(t)  -  l  c-j '  x-j 21  n ( t-|  )  +  do  +  dixi  +  d2X2  ,  (1.5) 

1»1 

where  is  the  Euclidean  distance  between  t  and  tj,  that  Is  t^2  «  | t-t i  | 2 
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s  (til  ~  xl)2  +  (ti2  -  x2)2;  tjj  Is  the  j th  component  of  tj,  >1,2, 
t  *  (xi,x2);  Cf‘  and  dv  are  constants,  i*l,...,N,  v=0,l,2. 

To  aid  in  understanding  (1.5)  the  function  TQ2ln(TQ)  Is  plotted  in  Figure 
1.1  for  to  a  (0,0)  and  x2  =  0.  Rotation  of  this  function  around  the 
ordinate  axis  and  centering  at  the  point  tj  will  produce  the  radially 
symmetric  function  T^2ln(T^).  Using  (1.5)  an  LSS  is  seen  to  be  composed  of  a 
linear  combination  of  these  radially  symmetric  functions  plus  a  plane.  The 
plane  has  zero  bending  energy  but  generally  does  have  nonzero  spring  energy. 
Linear  combinations  of  the  radially  symmetric  functions  can  be  forced  to 
interpolate  the  points  and  hence  may  have  zero  spring  energy  but  generally 
have  nonzero  bending  energy.  This  tradeoff  between  bending  and  spring  energy, 
or  smoothness  and  infidelity  to  the  data  (terminology  of  Wahba  (1979)),  leads 
one  to  consider  the  minimization  problem  of  Section  2  as  a  generalization  of 
these  ideas.  The  one  and  two  dimension  examples  with  m*2  are  special  cases  of 
this  generalization. 

We  see  that  the  motivation  for  one  and  two  dimensional  LSS's  is  quite 
simple  (at  least  for  m»2).  Attach  springs  to  the  data  points,  constrain  them 
to  lie  perpendicular  to  the  independent  variable  space  Rd,  then  let  the  curve 
or  surface  conform  by  simple  bending  to  the  minimum  energy  configuration. 

The  Laplacian  smoothing  spline  was  suggested  by  Ouchon  (1976)  as  a 
multidimensional  generalization  of  the  thin  plate  (or  “plaques  minces”),  d-2, 
interpolating  spline.  An  LSS  is  also  a  multivariate  generalization  of  the  one 
dimensional,  d-1,  "graduation"  spline  of  Schoenberg  (1964).  Furthermore,  the 
"graduation"  spline  is  a  generalization  of  the  familiar  cubic  smoothing 
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spline.  The  terminology  "Laplacian  smoothing  spline"  was  suggested  by 
Professor  I.  J.  Schoenberg.  An  explanation  for  using  the  term  "Laplacian"  is 
given  in  Wahba  (1979). 
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2.  Characterization 

Let  z\  *  f(tf)  +  ej,  1»1,...,N.  The  tjeRd  are  known  exactly.  We  assume 
that  the  function  f  is  smooth  but  otherwise  unknown.  By  smooth  It  Is  meant 
that  the  function  Is  well  approximated  by  a  function  geX;  X  Is  rigorously 
defined  in  Wahba  and  Wendelberger  (1980).  X  may  be  thought  of  as  a  space  of 
functions  which  approximate  well  a  large  class  of  functions  of  which  f  is  a 
member.  The  e^  are  independent,  zero  mean  and  finite  variance  random 
variables  with  variance-covariance  matrix  o2D02  =  <j2diag(oi2,...,ofj2).  Here 
a2  is  an  unknown  constant.  For  example,  if  we  know  that  all  the  variances  are 
equal  then  we  may  take  1.0  *  a\2  *  ...  =  a^2  in  what  follows.  The  aj2  used 
here  are  inversely  proportional  to  the  kj  of  Section  1,  that  is,  k-j  =  (aer-j)-2. 
The  tf-j 2  may  be  thought  of  as  relative  weights  of  the  measurement  errors  ej. 

The  zi  are  observed  dependent  variables  in  R*  and  the  corresponding  ti  are 
independent  variables  in  Rd,  i«l,...,N. 

A  Laplacian  smoothing  spline  is  the  function  g  which  is  the  solution  to 
the  problem. 

Find  geX,  X  a  suitable  function  space,  such  that 

N-l||0o-1(z-g)||2  +  (p/N)Jm(g)  (2.1) 

attains  its  minimum.  Here  define 

z  =  (zi,...,zn)t,  g  *  (9l,...,gN)T.  91  a  9Ui)»  I  iDo"1  U-9)  1 1 2 

*  (z-g)Toa-2(z-g),  Da-1  =  dlagfoj-1 . on"1)  , 

where  superscript  T  means  transpose  throughout.  Also, 

3mg(t) 

M'  m!  *  -  ' 

dm(9)  “  T  ——————  /,../  [  -  ]2dxj , . . .dx<j ;  (2.2) 

v«l  aj>v! . adfV!-»  -»  3xi°l,v,. . .  .Sx^.v 
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T  m+d-1 

t  *  (xj,... ,xd) 1 ;  M*  ■  (  d-1  );  the  •••»®d,v  are  the  M'  unique 

combinations  of  {0,1, . ...m}  such  that  ®i>v+. ..+®d,v  *  m. 

In  the  case  presented  earlier  with  d*2  and  m*2  we  have  M'=3  and 
(al,v»a2,v)  takes  on  the  M'  unique  values  (1,1),  (2,0)  and  (0,2).  In  this 
case  (2.2)  reduces  to  (1.4). 

The  solution  to  the  minimization  problem  is  unique  and  given  in  (2.3). 

N  M 

g(t)  =  Z  Cj9m  dTj2m-d(intj)Ie(d)  +  Z  dv4>v(t)  >  (2.2 

i=l  v-1 

where  Ie  is  the  indicator  function  of  even  integers,  that  is  Ie(d)=l,  for  d 

even  and  le(d)=0,  for  d  odd; 

/  (-i)d/2+l+m/(22m-l1,d/2(m_i)i(n,-d/2)i)>  d  even 
9m  d  s  (2.< 

’  \  r(d/2-m)/(22'M/2(m-l)!),  d  odd 

and  4>v  are  the  polynomials  of  total  degree  less  than  m, 

Plv  Pdv 

$v(t)  a  •frv, ( x 2. , •  •  •  »xd)  *  XI  . . . xd  .  (2.E 

Here  the  are  unique;  p-fv  >  0,  i=l . d  and  pjv+» •  •  +Pdv  <  m»  v=l,...,M, 

M  =  I"*  d  Define  the  M  by  d  matrix  P  to  have  ivth  element  Piv.  Also, 
2m-d  >  0  and  (2.6)  holds. 


Z  av$v(ti)  *  0,  1*1,. ...N  Implies  av  =  0,  v*l,...,M  .  (2.6) 

v*l 

(Condition  (2.6)  requires  that  the  matrix  T„  of  Section  5  step  (11)  be  of  rank 

M. )  c  *  (ci,...,cn)t  and  d  *  (dj . d^)T  are  obtained  by  solving  the  linear 

system 


(K  +  po20o2)c  +  Td  *  z 
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and 

TTc  =  0  .  (2.8) 

In  (2.7)  K  is  the  N  by  N  matrix  with  i jth  element 
9m,dTi j2m-d (ln(Tij))Ie(<1).  In  (2.7)  and  (2.8)  T  is  the  N  by  M  matrix  with 
i„th  element  4>v( t-j ) .  In  (2.7)  Da2  is  the  N  by  N  diagnal  matrix  with  iith 
entry  a^z.  o2  is  an  unknown  proportional ity  constant  which  along  with  p  is 
absorbed  into  X  using  NX  *  pa2  to  yield  (2.9)  from  (2.7). 

(K  +  NXDa2)c  +  Td  =  z  (2.9) 

The  approach  of  Harder  and  Oesmarais  (1972)  provides  us  with  a  physical 
interpretation  of  the  parameters  at  least  in  the  d=2  case.  p=NXar2  is  the 
plate  “rigidity"  which  is  a  constant.  The  value  of  p  depends  on  the  material 
and  the  thickness  of  the  plate.  The  spring  constant  kj  is  equal  to  the 
reciprocal  of  the  variance  or  (oaj)-2.  The  "load"  at  the  jth  point  is 
Pj  =  pcj  =  (aaj)-2rj  *  kjrj,  where  rj  is  the  unnormalized  or  unsealed  residual 
at  that  point;  i.e.,  rj  =  zj  -  g(tj),  j*l,...,N  or  r  =  z  -  Kc  -  Td. 

For  a  discussion  of  a  more  general  problem  and  the  derivation  of  the 
solution  the  reader  is  referred  to  Wahba  and  Wendelberger  (1980).  We  note 
here  that  if  the  ei  are  not  independent  but  instead  have  positive  definite 
covariance  matrix  proportional  to  2  then  Da2  and  Da_1  are  everywhere  replaced 
by  r  and  the  symmetric  Inverse  square  root  E-1/2  to  obtain  the  solution. 

To  this  point  we  have  assumed  knowledge  of  the  smoothness  parameter  X. 
However  it  is  generally  unknown.  Before  describing  a  method  to  dynamically 
choose  x  from  the  data  at  hand  we  provide  an  example  to  exhibit  its  influence 


on  the  LSS. 
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3.  Example  1  — Variation  of  the  LSS  with  X,  d»l. 

A  company  which  makes  and  repairs  small  computers  wants  to  forecast  the 
number  of  service  engineers  that  it  will  require  over  the  next  few  years.  To 
do  this  requires,  among  other  things,  knowledge  of  the  length  of  a  service 
call.  The  length  of  a  call  Is  a  function  of  the  number  of  components  within 
the  computer  which  must  be  repaired  or  replaced.  The  information  in  Table  3.1 
was  collected  on  24  service  calls;  the  data  are  from  Chatter jee  and  Price 
(1977).  We  would  like  to  fit  a  spline  to  the  data  in  order  to  forecast  the 
length  of  a  service  call. 

We  fit  a  spline  to  the  data  using  the  algorithm  given  in  Section  5.  The 
smoothness  parameter,  X,  is  dynamically  chosen  from  the  data  using  the  method 
of  generalized  cross  validation  (6CV).  By  showing  the  influence  of  x  on  the 
LSS  of  this  example  we  hope  to  provide  a  clearer  understanding  of  the  role  of 
GCV  in  choosing  the  smoothness  parameter.  The  results  of  the  following 
sections  will  be  easier  to  understand  with  this  example  in  mind.  Exactly  what 
the  GCV  choice  of  X  is  will  be  presented  in  Section  4. 

Figure  3.1  shows  a  plot  of  the  data  and  the  corresponding  spline  for  five 
different  values  of  x.  Because  there  are  only  24  observations  of  which  only 
17  have  unique  Independent  variables  we  should  not  be  surprised  if  the  GCV 
estimate  (to  be  described  in  Section  4)  of  X,  which  is  a  large  sample  result, 
does  not  perform  well.  The  confidence  Intervals  are  calculated  using  method 
of  Wahba  (1981);  the  formula  used  for  their  computation  is  given  in  Example  2 
of  Section  6. 
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TABLE  3.1 

EXAMPLE  1  -  REPAIR  TIMES 


Length  of  Calls 
(Minutes) 

23 

29 

49 

64 

74 

87 

96 

97 
109 
119 
149 
145 
154 
166 
162 
174 
180 
176 
179 
193 
193 
195 
198 
205 


Units  Repaired 
(Number) 

1 

2 

3 

4 

4 

5 

6 
6 

7 

8 
9 
9 

10 

10 

11 

11 

12 

12 

14 

16 

17 

18 
18 
20 


I 
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Considering  the  brief  explanation  of  the  problem  given  here  the  GCV 
choice  of  X,  as  used  in  Figure  3.1c,  seems  reasonable  to  use  in  predicting  the 
number  of  minutes  spent.  The  GCV  choice  of  X  appears  to  be  the  most  visually 
pleasing  and  consistent  with  how  we  would  expect  the  number  of  minutes  spent 
on  a  service  call  to  be  related  to  the  number  of  computer  components  repaired. 


18 


4.  Generalized  Cross  Validation 

In  the  example  of  Section  3  the  smoothing  parameter  X  is  unknown.  To 
determine  an  estimate  of  this  parameter  Craven  and  Wahba  (1979)  and  Wahba  and 
Mold  (1979)  have  suggested  the  use  of  generalized  cross  validation.  A  short 
synopsis  of  the  development  of  this  method  Is  given  to  enhance  the 
understanding  of  it. 

The  method  of  cross  validation  (presented  here  as  related  to  LSS's)  is 
developed  in  response  to  the  question:  How  well  may  one  expect  LSS's  to 
predict  the  true  functional  value  g(t)  at  some  point  t? 

Simple  cross  validation  (SCV)  suggests  predicting  the  true  functional 
values  of  data  different  from  that  used  in  the  analysis  to  assess  this 
predictive  ability.  In  its  simplest  form  this  entails  dividing  the  sample 
into  two  pieces  of  similar  size  using  one  section  for  optimization  and  the 
other  for  testing.  In  addition  to  this,  in  order  to  gain  more  information 
from  the  data,  the  two  pieces  may  be  interchanged  and  the  optimization  and 
testing  performed  on  each. 

SCV  is  alright  if  there  is  an  ample  supply  of  data  so  that  halving  or 
doubling  it  has  little  effect  on  the  quality  of  the  estimator.  To  lessen  this 
effect  Mosteller  and  Tukey  (1968)  propose  single  cross  validation  (1CV), 
(called  ordinary  cross  validation  by  Mahba  (1979)),  which  is  described 
suitably  by  them  as  follows: 

"Suppose  that  we  set  aside  one  individual  case,  optimize  for  what  is 

left,  then  test  on  the  set-aside  case.  Repeating  this  for  every  case 

squeezes  the  data  almost  dry.  If  we  have  to  go  through  the  full 
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optimization  calculation  every  time,  the  extra  computation  may  be  hard  to 

face.  Occasionally,  one  can  easily  calculate,  either  exactly  or  to  an 

adequate  approximation,  what  the  effect  of  dropping  a  specific  and  very 

small  part  of  the  data  will  be  on  the  optimized  result.  This  adjusted 

optimized  result  can  then  be  compared  with  the  values  for  the  omitted 

individual.  That  is,  we  make  one  optimization  for  all  the  data,  followed 

by  one  repetition  per  case  of  a  much  simpler  calculation,  a  calculation 

of  the  effect  of  dropping  each  individual,  followed  by  one  test  of  that 

individual.  When  practical,  this  approach  is  attractive." 

To  describe  1CV  mathematically  we  require  some  notation.  Let  g\Ci)  be 

the  solution  to  the  minimization  of  (2.1)  with  the  jt(l  point  removed  from  the 

analysis.  Similarly,  Da(J)  is  the  N-l  by  N-l  matrix  composed  of  0o  with  its 

jth  row  and  coiumn  removed.  To  "test  on  the  set  aside  case"  we  require  that 

-  Zj)/oj]2  be  small.  "Repeating  this  for  every  case"  and 

averaging  to  yield  an  overall  test  gives 

Vm°(*)  -  (1/M)  *  C(9A^)(tj)  -  Zj)/oj]2  .  (4.1) 

J“1 

1CV  uses  the  \  which  minimizes  Vm°(A). 

To  minimize  Vm°(A)  directly  is  not  a  trivial  computational  matter.  For 
each  proposed  value  of  \  a  system  of  the  form  (2.8)  and  (2.9)  (of  order  N+M-l 
instead  of  N+M)  must  be  solved  for  each  of  the  N  values  left  out  of  the 
analysis.  This  entails  solving  a  linear  system  of  order  N+M-l  N  times!  As 
noted  earlier  "if  we  have  to  go  through  the  full  optimization  calculation 
every  time,  the  extra  computation  may  be  hard  to  face."  Following  the  idea  of 
Mosteller  and  Tukey  we  seek  a  computational  simplification  for  the  mlnlmlzer 
of  VmO(\). 
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The  simplified  form  for  1CV  was  first  noted  by  Craven  and  Wahba  (1979) 
and  given  in  a  slightly  more  general  form  in  Wahba  and  Wendelberger  (1980). 

The  1CV  function  may  be  written 
N 

Vm°(X)  »  (1/N)  S^tgxttj)  -  Zj)/(oj(l-ajj(X)))]2  .  (4.2) 

ajj(X)  is  the  jth  diagonal  element  of  A^x)  which  is  defined  by 
/gx(ti)  \ 

Am(x)/  *[  .  1 

\ 9x(^N)  / 

where  g\  Is  the  solution  of  (2.1).  A^X)  may  be  thought  of  as  mapping  the 
vector  z  Into  the  smoothed  values. 

In  this  form  "we  make  one  optimization  for  all  the  data"  by  calculating 
gx  then  "followed  by  one  repetition  per  case  of  a  much  simpler  calculation,  a 
calculation  of  the  effect  of  dropping  each  Individual."  Here  find  ajj(x)  and 
use  (4.2). 

Evaluation  of  this  formulation  of  Vm°(X)  Involves  solving  a  linear  system 
of  size  N+M  to  find  gx  and  one  of  size  N  to  find  ajj(X).  This  Is  a 
considerable  Improvement  over  that  of  using  (4.1)  directly.  Because  of  a 
mathematical  simplification  the  amount  of  computation  needed  to  minimize 
Vmo(x)  can  be  substantially  reduced.  From  a  practical  point  of  view  this 
makes  the  use  of  cross  validation  very  attractive. 

When  applying  cross  validation  to  problems  other  than  LSS's  this  last 
step  of  finding  "what  the  effect  of  dropping  a  specific  and  very  small  part  of 
the  data  will  be  on  the  optimized  result"  Is  very  Important  and  should  not  be 
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overlooked.  In  fact,  this  step  often  makes  cross  validation  computationally 
feasible  whereas  without  this  insight  it  may  be  impractical. 

Finding  the  minimizer  of  Vm°(X)  requires  its  evaluation  at  different 
values  of  x  as  determined  by  a  search  routine.  Hence,  although  the 
minimization  is  possible  we  need  to  repeatedly  solve  large  linear  systems  with 
the  number  of  solution  times  being  a  function  of  the  search  routine  employed. 

In  Vmo(X)  of  (4.1)  each  deviation  of  gx^(tl)  from  the  observed  value  Zf 
is  treated  symmetrically.  This  choice  is  arbitrary  and  is  chosen  for 
simplicity.  A  more  general  approach  is  to  weight  each  term  of  (4.1)  or 
equivalently  (4.2)  to  yield 
N 

Vm(X)  =  (1/N)  £  Wi[(gx(ti)  -  z1)/(a1(l-aii(X)))]2.  (4.3) 

i  =  l 

Before  a  discussion  of  the  choice  of  these  weights  the  following  definition  is 
needed. 

Definition: 

N 

*m(*)  =  E(l/«)  I  C(f(tf)  -  gx(ti))/ai]2 
i*l 

is  the  expected  weighted  (by  o^)  mean  squared  error  between  the  true  function 
(f)  and  the  spline  (gx)  evaluated  at  the  Independent  variables  (t -f ) .  Here  E 
denotes  mathematical  expectation  with  respect  to  the  error  distribution  of  the 
random  errors  as  described  in  the  model  of  Section  2. 

If  we  want  Rm(x)  to  be  small  then  the  generalized  cross  validation  value 
of  X  should  be  used  as  the  smoothing  parameter  value.  Using  1CV  as  motivation 
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Craven  and  Wahba  (1979)  and  Golub,  Heath  and  Wahba  (1979)  have  shown  that  the 
X  which  minimizes  Vm(X)  with  weights 

N 

Mi  =  (l-aii(X))2/(l-N-1  2  ajj(X))2 

j*l 

is  an  estimate  of  the  X  which  minimizes  Rm(X).  Using  these  weights  in  (4.3) 
gives  the  generalized  cross  validation  function  (GCVF) 

N  N 

Vm(*)  ■  (1/M)  =  C(gx(tl)-21)/(oi(l-N-1  2  ajj(X)))]2  .  (4.4) 

1=1  '  j=l 

The  minimizer  of  (4.4)  is  called  the  GCV  estimate  of  X. 

The  GCVF  can  be  rewritten  as 

W)  =  (1/M)  |  ( Oct-  1  (I  -  Am(X))z||2/((l/N)Tr(I-Am(M))2  J  (4.5) 

where  Tr  is  the  trace. 

Wahba  (1981)  has  proposed 

oe2  =  | |D0-l(I.AR(X))z| |2/Tr(I-An(X))  (4.6) 

as  an  estimate  of  the  error  variance  a2.  This  leads  us  to  consider 

dfe  =  Tr(I-Atn(X))  as  the  degrees  of  freedom  of  error.  Using  these  notions  we 

rewrite  the  GCVF  as 

Vm(*)  =  Noe2/dfe  .  (4.7) 

The  method  of  GCV  may  be  viewed  as  minimizing  the  estimated  error 
variance  per  error  degrees  of  freedom.  This  may  further  be  thought  of  as  a 
form  of  parsimonious  model  selection. 


r 
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In  the  next  section  we  see  that  the  computation  of  Vm(X)  Is  reduced  to 
essentially  the  singular  value  (or  eigenvalue-eigenvector)  decomposition  of  a 
symmetric  positive  definite  N-M  by  N-M  matrix  (M  Is  usually  a  small  integer). 
The  above  decomposition  makes  it  possible  to  form  Vm(x)  by  simple  scalar 
operations  for  each  value  of  X.  Thus  we  have  taken  the  Ideas  of  Mosteller  and 

Tukey  one  step  further.  This  algorithm  is  much  simpler  than  the  original 

analysis  at  essentially  the  cost  of  a  one  time  eigenvalue-eigenvector 
decomposition;  i.e.,  changing  the  dependent  variable  (but  not  the  Independent 
variables)  does  not  necessitate  another  spectral  decomposition.  Thus,  many 
data  sets  which  have  identical  independent  variables  but  different  dependent 
variables  may  be  analyzed  quite  easily  and  inexpensively. 

When  using  GCV  with  a  small  sample  size  we  may  run  into  problems.  The 
most  frequent  small  sample  problem  with  GCV  is  that  x  *  0  or  X  *  «  is  chosen 

when  physical  considerations  dictate  that  It  should  not  be.  x  *  0  implies 

that  we  are  interpolating  the  dependent  variable.  This  should  be  done  if  the 
true  underlying  rigidity  p  Is  zero.  X  equal  to  infinity  Implies  that  we  are 
fitting  a  polynomial  of  degree  m-1  by  least  squares.  This  should  be  done  if 
either  the  variance  is  large  (relative  to  the  dependent  variable)  or  if  the 
true  underlying  rigidity  is  infinite  (i.e.,  the  true  model  is  a  polynomial). 

If  it  is  clear  from  other  considerations  that  the  value  of  x  chosen  is  not 
indicative  of  the  actual  underlying  mechanism  then  that  particular  value 
should  not  be  used  and  the  model  assumptions  should  be  checked  for 
violations. 

The  choice  of  m  can  also  be  made  by  GCV,  see  Lucas  (1978)  and  Wahba  and 


Wendelberger  (1980). 


5.  Algorithm 

The  user  must  supply  N  independent  variables,  t-feRd,  i*l,...,N,  and  their 
corresponding  dependent  variables,  z^eR**,  1*1, ...,N  to  compute  the  LSS  at  a 
point  teRd.  Assume  that  the  model  described  in  Section  2  holds.  In 
particular,  assume  the  independent  variables  tf  are  known  without  error  and 
the  dependent  variables  z\  consist  of  the  true  function  value  at  tj,  f(tj), 
plus  "noise,"  e-f,  zj  =  f(t<)  +  ej.  The  e^  are  independent  with  finite 
variance  cr2aj2,  o1  an  unknown  constant. 

To  produce  the  coefficients  c  and  d  needed  to  evaluate  the  spline  we 
solve  the  linear  system  of  equations 

(K  ♦  NA*Da2)c  +  Td  *  z 
and 

Ttc  *  0  . 

In  this  system  A*  is  the  optimal  value  of  the  smoothing  parameter  A  as 
determined  by  the  generalized  cross  validation  function.  If  A*  is  known  then 
the  solution  of  the  above  linear  system  could  be  accomplished  for  relatively 
large  values  of  N.  However,  It  is  usually  unknown  and  must  be  calculated  in 
order  to  solve  the  system  of  equations. 

The  method  currently  used  to  determine  A*  requires  the  solution  of  a 
symmetric  N-M  dimensional  eigenvalue-eigenvector  problem.  This  is  the  current 
computational  barrier  to  solving  problems  with  large  numbers  of  observations. 

The  algorithm  presented  in  Wahba  and  Wendelberger  (1980)  requires  the 
inversion  of  a  matrix  of  order  M  and  two  eigenvalue-eigenvector  decompositions 
of  symmetric  matrices,  one  N  by  N  and  the  other  (positive  definite) 
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N-M  by  N-M.  The  algorithm  presented  here  requires  the  solution  of  a 
triangular  system  of  order  M,  the  QR-decomposition  of  an  N  by  M  matrix  and  the 
singular  value  (or  eigenvalue-eigenvector)  decomposition  of  a  symmetric 
positive  definite  N-M  by  N-M  matrix.  This  algorithm  is  faster  and  requires 
fewer  operations,  primarily  because  of  the  replacement  of  one  N  by  N 
eigenvalue-eigenvector  decomposition  by  the  QR-decomposition  of  an  N  by  M 
matrix  (M  <  N). 

This  algorithm  provides  for  replicated  points.  A  replicated  point  is  one 
for  which  there  is  more  than  one  observation  of  the  dependent  variable  for  a 
particular  value  of  the  independent  variable.  Let  the  total  number  of  unique 
(independent  variable)  points  be  Nf|  and  define  N0  =  N  -  M  -  N(|.  Then  the 
computational  algorithm  is  as  follows: 

(i)  Compute  Ta  «  Do^T. 

( i 1 )  Perform  the  QR-decomposition  described  in  Dongarra,  et  al.,  (1979),  of 
Ta. 

Ta  3  (Ql.Q2)  x  (RT>0)T  . 

(iii)  Calculate  B  =  Q2TDa"lKDa_1Q2  . 

(iv)  Decompose  8  =  (Uj ,U2)0g' (Uj .Ug)^  , 

using  the  singular  value  decomposition  of  B,  as  described  by  Golub 
and  Reinsch  (1970)  or  using  the  spectral  decomposition  of  B  as 
described  by  Smith,  et  al.,  (1976);  where 
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Og 1  -  diagonal  matrix  of  the  eigenvalues  (t>i)  of  B,  which  is  of 
dimension  N-M  by  N-M, 

Og  -  diagonal  matrix  of  the  positive  eigenvalues  (bj)  of  B,  which 
is  of  dimension  N^  by  Nr, 

Ul  -  the  eigenvectors  of  the  positive  eigenvalues  of  B,  which  is  of 
dimension  (N-M)  by  Nj|,  and 

U2  -  the  eigenvectors  of  the  zero  eigenvalues  of  B,  which  is  of 
dimension  (N-M)  by  N0. 

(v)  Form  w  *  Ui^Q2^D(j"lz, 

*T  -  (wi,...,*Nn)  . 

(vi)  Obtain  A*  as  the  minimizer  of 


! 
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(vil)  Calculate 

Do-^UiDB-^lTVza  .  *  -  0 

Do^QzUiCfDB+NXIJ-^UiTQgTzg  , 
c  * 

0  <  X  <  •  and  N-M  =  Nty 

(5.2) 

Da-1Q2UlC(DB+NXI)-l-(NX)-ll]UiTQ2TZo 

+(NX)-lD0-1Q2Q2Tz0  ,  0  <  X  <  «  and  N-M  *  NN  . 

0  ,  X  =  •  . 

( vi 1 i )  Solve  the  triangular  system. 

Rd  «  qiT0o“l(z  -  Kc)  for  d, 

dT  3  (di . . 

An  important  aspect  of  this  method  is  the  relatively  small  cost  of 
reconstructing  a  new  LSS  using  the  identical  independent  variables  while 
changing  only  the  dependent  variables.  To  see  this  notice  that  the  bulk  of 
the  computational  effort  is  in  steps  (i)  through  (iv)  which  do  not  require 
knowledge  of  the  dependent  variables.  These  steps  depend  upon  the  independent 
variables  and  Dff.  To  construct  a  second  LSS  with  the  same  independent 
variables  and  identical  D0  we  need  only  save  the  matrices  Uj,  D0,  Db,  Qi,  Q2 
and  R.  With  these  matrices  we  perform  steps  (v)  through  (vili)  to  produce  a 
spline  for  another  set  of  dependent  variables,  say  z' ,  with  little  additional 
computational  effort. 

The  fact  that  obtaining  another  spline  from  z'  is  easy  requires  further 
consideration.  It  Is  made  possible  because  of  the  necessity  to  minimize  the 


r 
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GCVF.  This  minimization  provides  the  mechanism  to  easily  calculate  c  and  d  In 
steps  (vli)  and  (viii)  of  the  algorithm.  If  X*  was  somehow  known  a  priori 
then  we  could  go  right  ahead  and  solve  the  linear  system  (2.8)  and  (2.9)  at  a 
much  less  one  time  cost.  However*  even  with  X*  known,  if  we  had  many  new  data 
sets  z'  then  for  some  number  of  them  it  indeed  would  be  easier  to  do  the 
spectral  decomposition  once  and  for  all. 

Instead  of  saving  Ui,  Da,  Dg,  Qj,  Q2  and  R  we  actually  save  Q2U1,  D0,  Dg, 
QlT0o-1K  and  the  QR-decomposition  of  T0  to  retrieve  R,  Q2Q2T  and  Qj.  By  using 
these  matrices  we  can  perform  steps  (v)  through  (viii)  quite  inexpensively. 

The  QR-decomposition  can  be  stored  in  the  storage  which  has  been  allocated  for 
T0  plus  M  additional  storage  locations.  QiTDa-1K  is  retained  so  that  it  is 
unnecessary  to  reevaluate  K. 
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6.  Example  2--Franke's  Principal  Test  Function,  d»2. 

Example  2  is  a  Monte  Carlo  experiment  to  demonstrate  the  surface  (d*2) 
which  may  be  obtained  by  using  an  LSS  with  GCV.  The  "principal  test  function" 
of  Franke  (1979)  is  used  as  the  true  function  f.  This  surface  consists  of  two 
Gaussian  peaks  and  one  Gaussian  dip  superimposed  on  a  surface  sloping  towards 
the  first  quadrant.  The  surface  is  defined  by 

f(x,y)  =  -75  exp  -[[(9x-2)2+(9y-2)2]/4] 

+  .75  exp  -[[(9x+l)2/  49]+[(9y+l)/10]] 

+  .50  exp  -[[(9x-7) 2+(9y-3)2]/4] 

-  .20  exp  -[(9x-4)2+(9y-7)2] 

A  plot  of  the  surface  f  is  given  in  Figure  6.1. 

The  surface  is  reconstructed  from  169  "noisy"  observations  on  the  grid 
2j-l  2 k-1 

G  .  {ti|tH - , - ),  1*13(J-l)+k;  j ,k*l . 13}  . 

"  "  26  26 

The  “noisy"  observations  are 

*1  *  f(tl)  ♦  ei  with  ei'N(0,o2),  1*1,... ,169,  a2=(.03)2. 

The  ej  are  generated  by  the  pseudo  random  number  generator  RAENBR  at  the 
Madison  Academic  Computing  Center,  MACC  (1978).  The  LSS  with  m=*2  and  the 
smoothing  parameter  chosen  by  GCV  is  plotted  in  Figure  6.2.  The  closeness  of 
fit  can  be  qualitatively  seen  by  overlaying  Figure  6.2  on  Figure  6.1. 

For  this  example  the  calculated  oe2*(.026)z,  (using  (4.6)),  compares 
favorably  with  the  true  a2*(.03)2.  Using  <re2  to  obtain  confidence  Intervals 
for  the  true  curve  at  the  grid  points  G  as  in  Wahba  (1981)  gives  the  95% 
confidence  Intervals 
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Figure  6. 
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g\*(ti)  ±  1.96ae0i(aii(X*))1/2  ,  1-1 . N. 

Figure  6.3  gives  the  cross  section  along  the  grid  showing  the  true  curve, 
spline  fit,  observation  and  95%  confidence  Interval  at  each  point  for  each 
value  of  xf ,  i*l . 13. 

The  number  of  95%  confidence  intervals  which  cover  the  true  surface  is 
known  because  the  true  surface  is  known.  For  this  example  162  or  95.9%  of  the 
intervals  cover  the  true  surface.  This  is  a  favorable  comparison  since  the 
expected  number  is  161.  This  example  was  not  chosen  because  of  this  agreement 
but  rather  was  the  only  one  run  by  prior  decision. 

The  example  given  here  uses  points  on  a  grid  only  for  clarity  of  display. 
For  other  d=2  Monte  Carlo  results  see  Wahba  and  Mendel berger  (1980).  The 
meteorological  example  given  there  uses  Irregularly  spaced  points. 


3  Cross  sec 


Figure  6.3:  13  Cross  sections  of  Example  2--true  curve  (solid  line),  spline  fit  (dashed  line) 
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7.  Example  3— Deri vatl ves  and  Outliers,  d=3. 

Example  3  is  a  Monte  Carlo  experiment  with  d=3  and  true  function 

f(xl,X2,X3)=(2II)-3/2  exp  [(xi2+4x22+9x32)/(-2)]. 

Contours  of  f,  f’  and  f"  are  given  as  the  solid  lines  in  Figures  7.4,  7.5  and 
7.6. 

Three  hundred  points  ti,  1=1,..., 300  are  taken  from  a  uniform 
distribution  in  R={ (xi ,X2,X3) |-2<xi<2,  -1<X2<1,  -2/3<x3<2/3}.  The  true 
function  f  is  evaluated  at  each  of  the  points  t-j  and  added  to  a  Gaussian 
pseudo  random  variable  with  standard  deviation  a=.0025  to  yield  observation 
z-j.  The  peak  height  of  f  is  approximately  .0634.  a  is  roughly  41  of  the  peak 
height  and  therefore  these  data  have  a  “typical"  noise  level. 

A  value  of  m=4  was  chosen  for  this  example  in  order  that  the  second 
derivative  of  the  spline  could  be  used  as  an  estimate  of  the  second  derivative 
of  f.  If  k  is  the  order  of  the  derivative  desired  then  2m-2k-d  must  be 
positive.  Here  2x4-2x2-3  =  1  >  0  and  so  the  second  derivative  of  the  LSS  will 
be  a  good  estimate  of  the  second  derivative  of  f;  for  details  see  Wahba  and 
Wendelberger  (1980). 

The  estimate  ae  for  this  experiment  is  .0024  which  agrees  nicely  with  the 
true  value  of  .0025. 

Contours  of  the  true  function  and  the  fitted  spline,  g**,  are  plotted  in 
Figure  7.4  for  4  values  of  X3.  Because  of  the  symmetry  of  the  true  surface  it 
was  not  plotted  for  negative  values  of  X3.  The  true  function  and  the  fitted 
spline  are  close  to  one  another  near  the  center  of  the  region  and  this 
closeness  degrades  as  we  approach  the  boundary  in  each  of  the  three 
di rections. 
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The  contours  of  the  derivatives  of  f  and  gx*  with  respect  to  xj,  X2  and 
X3  are  given  in  Figures  7.5a,  7.5b  and  7.5c,  respectively.  The  contours  of 
the  second  derivatives  of  f  and  gx*  with  respect  to  xixi,  xix2,  xix3,  X2X2, 
X£X3  and  X3X3  are  given  in  Figures  7.6a,  7.6b,  7.6c,  7.6d,  7.6e,  and  7.6f, 
respectively.  The  same  qualitiative  behavior  is  displayed  by  these 
derivatives  as  of  the  function  with  the  degradation  occurring  relatively  more 
rapidly  as  the  boundary  is  approached.  Figure  7.6f  which  is  ( 32)/( 3x38x3)  of 
f  and  gx*  displays  a  particularly  good  fit  near  the  center  of  R. 

LSS's  may  be  utilized  to  detect  outliers  in  multidimensional  noisy  data 
provided  that  the  model  of  Section  2  is  (nearly)  appropriate.  The  model 
requires  that  the  observations  are  unbiased,  i.e.,  that  Ez=f.  The  errors 
should  be  additive  and  have  a  known  relative  error 'structure,  Dc.  For  the 
purpose  of  the  outlier  study  here  we  shall  further  assume  that  each  error  e-| 
has  a  Gaussian  distribution. 

To  what  extent  the  assumption  of  normality  may  be  relaxed  in  practice 
requires  further  study.  The  smoothness  assumption  requires  that  f(t)  is 
a  smooth  function  of  t.  This  rules  out  ’’cliff"  functions  or  those  with 
discontinuities.  By  using  a  probability  plot  of  the  residuals  the  example 
discussed  here,  which  satisfies  the  above  requirements,  will  be  used  to 
demonstrate  an  outlier  detection  method. 

Data  sets  with  outliers  need  to  be  constructed.  To  accomplish  this 
choose  the  two  points  of  tf,  1*1,. ..,300  which  are  nearest  to  and  farthest 
from  the  origin,  which  Is  the  center  of  the  data  region.  These  two  points  are 
t|<  =  (-.056,  -.032,  -.042)  and  t]  =  (1.985,  -.879,  -.325),  respectively.  To 
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Figure  7.6a:  Example  3— solid  line  Is  d2f/dxi2,  dashed  line  Is  d2g/dxi2 


Figure  7.66:  Example  3— solid  line  is 


,  dashed  line  Is  d2g/dxidx2. 


49 


construct  data  sets  Z|<s,  let  each  element  of  equal  the  corresponding 
element  of  z  except  for  the  kth.  The  kt*1  element  Is  set  equal  to  f (t jt)  -  so, 
o=.0025.  Construct  Z]S  analogously  except  that  the  1th  element  becomes 
f ( t i )  +  so. 

With  the  data  sets  Z|(S  and  zis  probability  plots  in  Figures 
7.7  and  7.8  were  obtained  with  MINITAB,  Ryan,  Joiner  and  Ryan  (1976).  The 
probability  plot  is  constructed  by  ordering  the  residuals  rj  from  smallest  to 
largest  and  plotting  them  against  their  corresponding  normal  scores.  The  i**1 
smallest  normal  score  as  used  by  MINITAB  is  the  ( i -3/8) /300 . 25  percentage 
point  of  the  normal  or  Gaussian  distribution.  If  the  error  distribution  that 
Is  postulated  in  the  model  is  the  correct  one,  then  the  probability  plot 
should  be  nearly  linear.  In  the  data  sets  constructed  here  the  error 
distribution  is  not  correct  because  the  kth  or  1th  point  is  biased  and 
contains  no  random  component. 

The  numbers  in  Figures  7.7  and  7.8  indicate  how  many  points  are  plotted 
at  that  spot  on  the  graph.  An  asterisk  indicates  one  point  and  a  plus  sign 
indicates  that  more  than  9  points  are  overlapping.  In  Figures  7.7b,  c  and  d 
the  outlier  is  identified  as  the  point  which  is  separate  from  the  points  which 
form  the  line.  As  the  assumption  of  unbiasedness  is  more  strongly  violated  it 
shows  up  more  obviously  in  the  plot. 

Figures  7.8a-d  demonstrate  that  this  outlier  detection  scheme  Is  not 
Invincible  and  should  be  used  in  conjunction  with  other  diagnostic  checks. 

The  point  t]  has  very  high  leverage  because  it  is  on  the  boundary  of  the  data 
region.  In  linear  regression  this  is  analogous  to  the  points  at  the  extremes 
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.7a:  Residuals  vs.  normal  scores  for  one  outlier,  f(t|<)  -  Oo,  at  . 


.0060+ 


.0025+ 


-.0010+ 


-.0045+ 


33* 

564* 

292 

++« 

++ 

3++* 

5++ 

+4+ 

3+44 

397 

•464 

*3 

*22 

** 


-.0000+  * 

I  I  t  t  .  J. 

-3.0  -1.5  .0  1.5  3.0  4.5 

NORMAL  SCORE 


,7b:  Resdluals  vs.  normal  scores  for  one  outlier,  f(t|<)  -  6o,  at  t^. 

  -  A 


51 


.0120 


R 

E 

S 

I 

0 

u 

A 

L 


.0060 


.0000 


.0060 


.0120 


.0180 


*76442 

|Q 

3+++ 


60+* 

446* 


*23 


-3.0  -1.5  .0  1.5  3.0  4.5 

NORMAL  SCORE 

Figure  7.7c:  Residuals  vs.  normal  socres  for  one  outlier,  f (t^ )  -  lOo,  at  t^ 

.015+ 


R 

E 

S 

I 

0 

u 

A 

L 


.005+ 


22** 

4* 


++*9764 


679++* 


-.005+  **2344 

_  • 


-.015+ 


-.025+ 


-.035+  * 


-3.0  -1.5  .0  1.5  3.0  4.5 

NORMAL  SCORE 


Figure  7.7d:  Resdiuals  vs.  normal  scores  for  one  outlier,  f(t|<)  -  20o,  at  t ^ . 
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Figure  7.8a:  Residuals  vs.  normal  scores  for  one  outlier,  f(tj)  +  Oo, 
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Figure  7.8b:  Resdiuals  vs.  normal  scores  for  one  outlier,  f(tj)  +  6o,  at  t] 
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of  the  Independent  variable  range  which  also  have  high  leverage.  Because  of 
this  the  residual  at  t-\  Is  not  large  and  does  not  show  up  in  the  probability 
plots  of  Figures  7.8a-d.  The  leverage  at  t^  Is  so  large  that  it  causes 
another  point,  the  one  in  the  lower  left.  In  Figure  7.8d  to  appear  as 
descrepant.  The  probability  plot  provides  a  technique  to  check  model 
assumptions.  However,  as  demonstrated  here,  this  technique  should  be  used  in 
conjunction  with  other  diagnostic  checks  and  with  a  good  understanding  of  the 
pitfalls  which  may  be  encountered. 

Another  diagnostic  check  which  may  be  employed  here  is  to  plot  the 
residuals,  r-j ,  against  the  distance  from  t-(  to  t].  This  is  analogous  to 
plotting  the  residuals  against  the  Independent  variable  In  simple  linear 
regression.  If  a  nonrandom  pattern  is  observed,  such  as  serial  correlation, 
then  we  have  evidence  that  some  model  assumption  is  being  violated.  In 
practice,  t]  Is  unknown  and  hence  it  may  be  necessary  to  oo  all  possible 
plots,  1=1,... ,N. 

If  a  scaling  0CT  had  been  used  then  the  scaled  residuals  Da~lr  would  be 
plotted  Instead  of  r. 

The  procedure  described  here  is  a  diagnostic  method  by  which  some  of  the 
model  assumptions  may  be  checked.  Irregularly  spaced  multidimensional  “noisy" 
data  easily  mask  outliers.  This  technique  provides  a  means  which  may  detect 
these  discrepant  observations.  It  Is  presented  here  in  the  hope  that  it 
becomes  a  routine  method  to  check  for  model  violations  in  an  analysis  which 
uses  LSS's. 

The  three  dimensional  results  presented  here  are  new  and  quite  promising. 
A  quantitative  measurement  of  the  goodness  of  fit  of  the  estimated  spline  and 
Its  derivatives  to  the  true  function  is  given  In  Wendelberger  (1981).  Further 
Monte  Carlo  experiments  will  be  performed  In  3  and  more  dimensions. 


f 


t 


55 


8.  Running  the  program 

To  evaluate  an  LSS  at  any  point,  teRd  involves  the  execution  of  two 
computer  programs.  The  first  of  these,  called  MAIN,  produces  the  coefficients 
of  the  spline.  The  second,  called  EVALUATE,  produces  the  spline,  g^m.xd)* 

If  2m-2k-d  is  positive  EVALUATE  may  also  be  used  to  produce  the  first  (k*l )  or 
second  (k=2)  derivative  of  gN,m,x*  Depending  upon  the  particular  problem  at 
hand  the  user  specifies  different  options  to  be  exercised  by  the  program. 

These  options  will  be  explained  card  by  card  below.  Card  1  will  be 
abbreviated  Ci  and  the  commands  are  summarized  in  Table  8.1  with  an  example 
runstream  given  in  Table  8.3. 

Cl  is  used  to  specify  whether  or  not  the  coefficient  arrays  c  and  d  and 
the  matrices  X  and  P  used  to  reconstruct  the  spline  are  written  to  unit  13.  X 
contains  the  values  of  the  independent  variables  and  P  contains  the  exponents 
of  the  polynomials  In  (2.5),  where  P  is  rigorously  defined. 

To  accomplish  storing  the  spline  in  unit  13  Cl  should  have  SS13  in 
columns  1  through  4.  If  EVALUATE  is  not  going  to  be  run  then  the  contents  or 
unit  13  will  be  unused.  In  this  case  Cl  should  be  PONT. 

Someone  other  than  the  casual  user  may  require  other  arrays  and  matrices 
which  are  also  written  to  unit  13.  See  subroutine  WRT13  in  Wendelberger 
(1981)  for  details  on  the  arrays  and  matrices  which  are  written  to  unit  13. 

C2,  to  be  described  in  the  next  paragraph,  writes  into  unit  14.  See 
subroutines  AWRT14  and  BWRT14  to  determine  the  specific  values  which  are 


written  to  unit  14 
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TABLE  8.1 


Input  for  MAIN 

CARD  POSSIBLE  VALUES  FORMAT 

1  SS13,  DONT  A4 

2  SM14,  UM14,  OONT  A4 

3  SRI 5,  SP15,  VL15,  DONT  A4 

4  MGCV,  USEL  A4 

4+  (x)(Insert  if  C2  is  USEL.)  E15.8 

5  VARI,  STAN,  SAME  (Omit  if  C2  is  UM14.)  A4 

6  (d,N,m)  (Omit  if  C2  is  UM14.)  315 

7  Format  of  cards  C8+1,...,C8+N.  18A4 

8+1  (zj,  tjT,  oj  or  «i2) 

.  (zi)  (If  C2  is  not  UM14.) 


(Zi,  t|T,  <j1  or  at2)  (If  C5  is  STAN  or  VARI.) 

.  Format  is  provided  on  C7.  (See  C7) 

8+N  (zN,  t|tT,  aft  or  oft2) 

9  YES,  NO  A4 
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C2  provides  the  ability  to  store  certain  matrices  In  unit  14  by  using 
SM14  in  columns  1  through  4.  The  storage  of  these  matrices  makes  It 
unnecessary  to  perform  the  bulk  of  the  computations  if  a  second  analysis  is  to 
be  performed.  However,  only  the  dependent  variables  may  be  changed  for  such  a 
subsequent  analysis.  The  relative  variances  or  standard  deviations  must  be 
identical  to  the  run  which  used  SM14  on  C2. 

UM14  in  the  first  four  columns  of  C2  provides  for  use  of  the  matrices 
which  have  previously  been  stored  in  unit  14.  If  the  value  of  C2  Is  PONT  then 
the  matrices  are  neither  stored  nor  used. 

C3  provides  a  means  to  retrieve  certain  information  during  the  execution 
of  MAIN  and  to  store  this  information  In  unit  15.  The  first  four  columns  of 
C3  must  be  SR15,  SP15,  VL15  or  PONT.  If  C3  Is  SR15  the  residuals 
r  ■  (z-9N,m,x(t))  «i*e  stored  in  unit  15  with  the  format  (G24.18).  If  C3  Is 
SP15  the  ordinate  and  abclssa  for  each  point  of  the  plot  of  the  GCVF  as  given 
In  the  output  are  stored.  First  the  number  (n)  of  pairs  is  stored  In  15 
format  followed  by  the  ordered  pairs  (1  ,ln(V(10ai+b))),  where  1  is  an  Index 
number  i=l,...,n  and  In  Is  the  natural  logarithm;  the  format  used  is 

(I3.G24.18).  If  C3  Is  VL15  then  bj/N,  1*1 . N-M  with  format  (G24.18) 

followed  by  w  with  with  the  same  format  are  stored.  If  none  of  the  above 
are  to  be  stored  then  C3  should  be  PONT. 

The  value  of  MGCV  on  C4  causes  the  GCVF  to  be  minimized  to  determine  A*. 
If  the  user  wants  to  supply  a  value  of  X  then  the  value  of  C4  should  be  USEL. 
In  that  case  C4+  is  used.  C4+  should  contain  the  value  of  x  In  (E15.8)  format 
to  be  stored  in  a  single  precision  variable.  If  C4  Is  MGCV  then  C4*  should 
not  be  Included  In  the  Input  stream. 
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C5  Is  not  used  If  the  value  of  C2  is  UM14.  Otherwise  C5  is  used  to  input 
relative  variances  or  relative  standard  deviations  or  neither  of  these  for  the 
errors  of  the  dependent  variable.  If  the  relative  variances  are  to  be  read 
then  C5  should  be  VAR I;  if  the  relative  standard  deviations  are  read  then  C5 
is  STAN;  and  If  neither  Is  read  then  C5  Is  SAME.  The  value  SAME  is  equivalent 
to  that  of  entering  all  l's  as  the  relative  variances.  However,  if  SAME  is 
used  then  the  program  circumvents  both  multiplication  and  division  by  1  since 
Da  is  simply  the  Identity  matrix. 

C6  is  not  used  if  C2  is  UM14.  Otherwise  C6  reads  In  the  number  of 
independent  variables  (=  dimension),  the  number  of  observations  N  and  the 
value  of  m  to  be  used.  The  format  used  is  (315). 

C7  contains  the  format  to  be  used  to  read  in  the  data  values.  The  format 
should  require  at  most  72  spaces  including  the  left-  and  right-most 
parentheses. 

The  data  follow  in  C8+1  through  C8+N.  The  data  should  be  real  Fortran 
variables,  each  data  line  should  contain.  In  order,  the  dependent  variable, 
the  Independent  varlable(s)  and  the  relative  variance  or  standard  deviation  if 
used.  If  C2  Is  UM14  then  C8+1  through  C8+N  should  contain  only  the  dependent 
variables.  They  should  be  given  In  the  Identical  sequence  as  the  dependent 
and  independent  varlable(s)  were  when  C2  had  the  value  SM14. 

The  last  card  to  be  read  is  C9.  It  should  contain  one  of  the  values  YES 

or  NO _ .  If  YES  then  experimental  confidence  Intervals  are  provided  along 

with  degrees  of  freedom  and  an  estimate  of  the  variance  (Wahba,  (1981)).  If 
NO  then  these  values  are  neither  computed  nor  printed. 
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To  evaluate  the  spline  (k  *  0),  first  derivative  (k  »  1)  or  second 
derivative  {k  *  2)  the  program  EVALUATE  Is  used.  Previous  to  running  EVALUATE 
the  program  MAIN  must  have  been  run  with  Cl  writing  the  coefficients  to  unit 
13  (Cl  must  have  been  SS13).  EVALUATE  will  then  read  the  matrices  from  unit 
13  and  calculate  the  spline.  Its  first  derivative  or  second  derivative.  The 
kth  derivative  (k  *  1  or  k  =  2)  will  be  calculated  only  if  2m-2k-d  is  greater 
than  0.  A  description  of  the  input  stream  for  EVALUATE  is  given  In  Table 
8.2  with  a  sample  runstream  given  in  Table  8.3. 

Cl  contains  two  integer  values  in  (215)  format.  The  first  Integer,  N', 
specifies  the  number  of  points  teRd  at  which  the  function  is  to  be  evaluated. 
The  second  integer  should  be  one  of  0,  1  or  2  depending  upon  whether  the 
spline,  first  or  secc.id  derivative,  respectively.  Is  to  be  calculated. 

The  second  card  contains  the  format  to  be  used  to  read  In  the  N'  points. 
The  format  should  require  at  most  72  spaces.  Including  the  left-  and 
right-most  parentheses.  The  Independent  variables  are  read  line  by  line  In 
the  same  sequence  as  that  which  was  used  to  calculate  the  coefficients. 

C3  must  be  either  SV15  or  PONT.  To  store  the  values  In  unit  15,  C3 
should  be  SV15.  This  causes  the  values  followed  by  the  corresponding 
Independent  varlable(s)  to  be  written  to  unit  15.  If  C3  Is  PONT  then  the 
values  are  not  written  to  unit  15, 

C3+  Is  used  only  If  C3  Is  SV15.  Then  C3+  should  have  the  format  which  Is 
to  be  used  to  write  the  calculated  value(s)  followed  by  the  Independent 
varlable(s)  Into  unit  15.  This  format  may  have  at  most  72  spaces  Including 
both  the  left-  and  right-most  parentheses. 


TABLE  8.2 


Input  for  EVALUATION 


CARD 

POSSIBLE  VALUES 

FORMAT 

1 

(N*,k) 

215 

2 

Format  to  read  C4+1 . C4+N'. 

18A4 

3 

SV15.D0NT 

A4 

3+ 

Format  for  15  (Omit  If  C2  Is  DONT.) 

18A4 

4+1 

• 

Independent  variable  points 

• 

of  evaluation,  tT. 

(See  C2) 

• 

Format  Is  provided  In  C2. 
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TABLE  8.3 


Sample  Runstreams 


Comments 


@XQT  SM00TH*SP  L I NE . MA I N 

SS13 

SM14 

DONT 

MGCV 

SAME 


1  24  2 
(F3.10.33X.F4.0) 
@ADD  DATA. 

YES 


Implements  the  MAIN  program. 

Stores  the  spline  coefficients  In  unit  13. 
Stores  matrices  in  unit  14. 

Doesn't  store  other  values. 

Minimize  the  GCVF  to  determine  X*. 

The  relative  variances  are  all  the  same. 
One  dimension,  24  observations,  m«2. 

Format  of  the  input  data. 

Inserts  data  from  Table  3.1  In  runstream. 
Provide  confidence  Intervals. 


@XQT  SM00TH*SPL I NE . EVALUATE 
200  0 
(36X.F8.4) 

SV15 

(2E15.8) 

9ADD  PLOTDATA. 


Implements  the  EVALUATION  program. 

At  200  points  evaluate  the  spline. 

Format  of  the  Independent  variables. 

Store  the  spline  and  independent  variable 
values  in  unit  15. 

Format  of  above. 

Inserts  abclssa  points  to  be  used  for 
plotting. 


(BXQT  SM00TH*SPL  I  NE .  MA  I N 

SS13 

UM14 

DONT 

USEL 

.00016EOO 

(F3.0) 

@ADD  DATA. 

YES 


Implements  the  MAIN  program. 

Stores  the  spline  coefficients  in  unit  13. 
Uses  the  matrices  stored  in  14  by  MAIN 
above. 

Doesn't  store  other  values. 

Use  the  following  value  of  X. 

Value  of  X  to  be  used. 

Format  of  the  dependent  variables. 

Inserts  data  from  Table  3.1. 

Provides  confidence  Intervals. 


@XQT  SM00TH*SPL I NE. EVALUATE 
200  0 
(36X.F8.4) 

SV15 

(2E15.8) 

@ADD  PLOTOATA. 


Implements  the  evaluation  program. 

At  200  points  evaluate  the  spline. 

Format  of  the  Independent  variable. 

Store  the  spline  and  Independent  variable 
In  15. 

Format  of  above. 

Inserts  abclssa  points  to  be  used  for 
plotting. 


9 


62 

C4+1  through  C4+N'  contain  the  Independent  variable(s)  at  which  the 
function  Is  to  be  evaluated.  These  should  be  In  the  format  given  on  C2.  The 
Independent  varlable(s)  should  be  In  the  same  sequence  as  used  to  obtain  the 
coefficients  with  the  program  MAIN. 

The  programs  MAIN  and  EVALUATE  are  written  In  ASCII  FORTRAN  Level  9R1  and 
are  running  on  the  UNIVAC  1100/80  computer  at  the  University  of  Wisconsin. 

All  calculations  are  performed  in  double  precision. 

The  subroutines  used  by  the  programs  MAIN  and  EVALUATE  are  named: 

AWRT14,  BWRT14,  CALC,  CALO,  CALRES,  CHECKQ,  COLOFK,  CONINT,  DATAR,  DERIV1, 
DERIV2,  E,  EDI,  ED2,  GETASI,  GETBM ,  GETR,  GETROE,  GETTHM,  GRAPHV ,  MAKEB, 
MAKETS,  MINVL1 ,  MINVL2,  MQRDC,  PRINT,  PRNTLM,  RCHECK,  REA013,  SPLINE,  SVDB, 
VARDF,  VLHELP,  VOFL,  WHATDO,  WRT13,  AND  WRT15.  GRAPHV,  MINVL1  and  MINVL2  are 
modeled  after  similar  subroutines  of  the  one  dimensional  smoothing  spline 
program  written  by  Fleisher  (1 979)  and  running  at  the  Madison  Academic 
Computing  Center  (MACC).  A  description  of  the  program  structure  Is  given  In 
Wendelberger  (1981). 

The  following  LINPACK  subroutines  are  also  used  by  the  program  MAIN: 
DAXPY,  DCOPY,  DDOT,  DNRM2 ,  DQRDC ,  DQRSL ,  DROT,  DROTG,  DSCAL ,  DSVDC,  DSWAP  and 
DTRSL.  The  code  for  these  routines  Is  not  Included  here.  It  may  be  found  In 
the  LINPACK  USERS'  GUIDE  by  Dongarra,  Bunch,  Moler  and  Stewart  (1979).  One 
modification  Is  made  In  the  LINPACK  subroutine  DSVDC:  the  parameter  MAXIT  Is 
Increased  from  30  to  60.  This  parameter  sets  the  maximum  number  of  Iterations 
to  be  performed  in  the  algorithm  to  determine  the  singular  values  and  vectors 
of  B  before  termination  due  to  nonconvergence.  Increasing  MAXIT  to  60  Is 
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necessary  because  with  Targe  N,  say  N  >  140,  30  iterations  may  not  be  large 
enough  for  some  problems.  An  example  with  N*150  failed  because  MAXIT-30  was 
too  small.  However,  with  MAXIT«60  example  3  with  N»300  was  successfully  run. 
In  fact  MAXIT»60  has  proved  ample  for  all  examples  tried  to  date.  The  version 
of  the  program  described  here  uses  the  singular  value  decomposition  to  obtain 
the  spectral  decomposition  of  B.  A  new  modified  version  uses  the  EISPACK 
(Smith,  et  al.,  (1976))  routines  DTRED2  and  DTQL2  to  accomplish  this  task  at  a 
much  reduced  cost  and  at  no  loss  in  accuracy.  This  is  because  the  singular 
value  decomposition  does  not  make  use  of  the  symmetry  of  B.  The  EISPACK 
routines  do  make  use  of  the  symmetry  of  B  and  thus  the  cost  of  the 
decomposition  is  roughly  cut  in  half. 
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